AITopics | pipeline parallel optimization

Collaborating Authors

pipeline parallel optimization

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Theoretical Limits of Pipeline Parallel Optimization and Application to Distributed Deep Learning

Neural Information Processing SystemsDec-26-2025, 03:31:13 GMT

We investigate the theoretical limits of pipeline parallel learning of deep learning architectures, a distributed setup in which the computation is distributed per layer instead of per example. For smooth convex and non-convex objective functions, we provide matching lower and upper complexity bounds and show that a naive pipeline parallelization of Nesterov's accelerated gradient descent is optimal. For non-smooth convex functions, we provide a novel algorithm coined Pipeline Parallel Random Smoothing (PPRS) that is within a $d^{1/4}$ multiplicative factor of the optimal convergence rate, where $d$ is the underlying dimension. While the convergence rate still obeys a slow $\varepsilon^{-2}$ convergence rate, the depth-dependent part is accelerated, resulting in a near-linear speed-up and convergence time that only slightly depends on the depth of the deep learning architecture. Finally, we perform an empirical analysis of the non-smooth non-convex case and show that, for difficult and highly non-smooth problems, PPRS outperforms more traditional optimization algorithms such as gradient descent and Nesterov's accelerated gradient descent for problems where the sample size is limited, such as few-shot or adversarial learning.

parallel optimization and application, pipeline parallel optimization, theoretical limit, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.54)

Add feedback

Reviews: Theoretical Limits of Pipeline Parallel Optimization and Application to Distributed Deep Learning

Neural Information Processing SystemsFeb-5-2025, 07:27:00 GMT

The relationship between the proposed pipeline parallel optimization setting and existing work is not clear. Does it contain related work as special cases? The authors mentioned in the abstract that the presented study is distributed per-layer instead of per-sample. It could be helpful to give additional comparison along this line. This was briefly touched in Section 2 on asynchronous value/gradient evaluation.

parallel optimization and application, pipeline parallel optimization, theoretical limit, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.44)

Add feedback

Reviews: Theoretical Limits of Pipeline Parallel Optimization and Application to Distributed Deep Learning

Neural Information Processing SystemsFeb-5-2025, 07:26:48 GMT

The reviewers agreed that this paper is a nice contribution to the literature and provides interesting and potentially useful convergence results in the framework of pipeline parallel optimization. The reviewers were impressed by the rebuttal and encourage the authors to incorporate the clarifications therein into the paper.

artificial intelligence, machine learning, parallel optimization and application, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.40)

Add feedback

Theoretical Limits of Pipeline Parallel Optimization and Application to Distributed Deep Learning

Neural Information Processing SystemsOct-11-2024, 05:54:21 GMT

We investigate the theoretical limits of pipeline parallel learning of deep learning architectures, a distributed setup in which the computation is distributed per layer instead of per example. For smooth convex and non-convex objective functions, we provide matching lower and upper complexity bounds and show that a naive pipeline parallelization of Nesterov's accelerated gradient descent is optimal. For non-smooth convex functions, we provide a novel algorithm coined Pipeline Parallel Random Smoothing (PPRS) that is within a d {1/4} multiplicative factor of the optimal convergence rate, where d is the underlying dimension. While the convergence rate still obeys a slow \varepsilon {-2} convergence rate, the depth-dependent part is accelerated, resulting in a near-linear speed-up and convergence time that only slightly depends on the depth of the deep learning architecture. Finally, we perform an empirical analysis of the non-smooth non-convex case and show that, for difficult and highly non-smooth problems, PPRS outperforms more traditional optimization algorithms such as gradient descent and Nesterov's accelerated gradient descent for problems where the sample size is limited, such as few-shot or adversarial learning.

parallel optimization and application, pipeline parallel optimization, theoretical limit, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)

Add feedback

Theoretical Limits of Pipeline Parallel Optimization and Application to Distributed Deep Learning

Colin, Igor, SANTOS, Ludovic DOS, Scaman, Kevin

Neural Information Processing SystemsMar-19-2020, 01:45:59 GMT

We investigate the theoretical limits of pipeline parallel learning of deep learning architectures, a distributed setup in which the computation is distributed per layer instead of per example. For smooth convex and non-convex objective functions, we provide matching lower and upper complexity bounds and show that a naive pipeline parallelization of Nesterov's accelerated gradient descent is optimal. For non-smooth convex functions, we provide a novel algorithm coined Pipeline Parallel Random Smoothing (PPRS) that is within a $d {1/4}$ multiplicative factor of the optimal convergence rate, where $d$ is the underlying dimension. While the convergence rate still obeys a slow $\varepsilon {-2}$ convergence rate, the depth-dependent part is accelerated, resulting in a near-linear speed-up and convergence time that only slightly depends on the depth of the deep learning architecture. Finally, we perform an empirical analysis of the non-smooth non-convex case and show that, for difficult and highly non-smooth problems, PPRS outperforms more traditional optimization algorithms such as gradient descent and Nesterov's accelerated gradient descent for problems where the sample size is limited, such as few-shot or adversarial learning.

parallel optimization and application, pipeline parallel optimization, theoretical limit, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback